Dataset statistics
| Number of variables | 16 |
|---|---|
| Number of observations | 98826 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 17.1 MiB |
| Average record size in memory | 181.8 B |
Variable types
| NUM | 15 |
|---|---|
| CAT | 1 |
Reproduction
| Analysis started | 2020-09-17 12:12:31.843332 |
|---|---|
| Analysis finished | 2020-09-17 12:13:25.931131 |
| Duration | 54.09 seconds |
| Version | pandas-profiling v2.7.1 |
| Command line | pandas_profiling --config_file config.yaml [YOUR_FILE.csv] |
| Download configuration | config.yaml |
dob_year is highly correlated with age | High correlation |
age is highly correlated with dob_year | High correlation |
mobile_likes_received is highly correlated with likes_received | High correlation |
likes_received is highly correlated with mobile_likes_received and 1 other fields | High correlation |
www_likes_received is highly correlated with likes_received | High correlation |
likes_received is highly skewed (γ1 = 112.0153748) | Skewed |
mobile_likes_received is highly skewed (γ1 = 107.4720743) | Skewed |
www_likes_received is highly skewed (γ1 = 126.1906692) | Skewed |
df_index is uniformly distributed | Uniform |
df_index has unique values | Unique |
userid has unique values | Unique |
friend_count has 1962 (2.0%) zeros | Zeros |
friendships_initiated has 2994 (3.0%) zeros | Zeros |
likes has 22285 (22.5%) zeros | Zeros |
likes_received has 24400 (24.7%) zeros | Zeros |
mobile_likes has 35002 (35.4%) zeros | Zeros |
mobile_likes_received has 29964 (30.3%) zeros | Zeros |
www_likes has 60935 (61.7%) zeros | Zeros |
www_likes_received has 36825 (37.3%) zeros | Zeros |
| Distinct count | 98826 |
|---|---|
| Unique (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 49497.453939246756 |
|---|---|
| Minimum | 0 |
| Maximum | 99002 |
| Zeros | 1 |
| Zeros (%) | < 0.1% |
| Memory size | 772.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 4943.25 |
| Q1 | 24741.25 |
| median | 49502.5 |
| Q3 | 74251.75 |
| 95-th percentile | 94053.75 |
| Maximum | 99002 |
| Range | 99002 |
| Interquartile range (IQR) | 49510.5 |
Descriptive statistics
| Standard deviation | 28582.95239 |
|---|---|
| Coefficient of variation (CV) | 0.5774630837 |
| Kurtosis | -1.200185093 |
| Mean | 49497.45394 |
| Median Absolute Deviation (MAD) | 24755.5 |
| Skewness | -6.10324753e-05 |
| Sum | 4891635383 |
| Variance | 816985167.2 |
| Value | Count | Frequency (%) | |
| 2047 | 1 | < 0.1% | |
| 60040 | 1 | < 0.1% | |
| 74417 | 1 | < 0.1% | |
| 80562 | 1 | < 0.1% | |
| 78515 | 1 | < 0.1% | |
| 68276 | 1 | < 0.1% | |
| 66229 | 1 | < 0.1% | |
| 72374 | 1 | < 0.1% | |
| 70327 | 1 | < 0.1% | |
| 92856 | 1 | < 0.1% | |
| Other values (98816) | 98816 | > 99.9% |
| Value | Count | Frequency (%) | |
| 0 | 1 | < 0.1% | |
| 1 | 1 | < 0.1% | |
| 2 | 1 | < 0.1% | |
| 3 | 1 | < 0.1% | |
| 4 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 99002 | 1 | < 0.1% | |
| 99001 | 1 | < 0.1% | |
| 99000 | 1 | < 0.1% | |
| 98999 | 1 | < 0.1% | |
| 98998 | 1 | < 0.1% |
| Distinct count | 98826 |
|---|---|
| Unique (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1597069.4826867424 |
|---|---|
| Minimum | 1000008 |
| Maximum | 2193542 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 772.2 KiB |
Quantile statistics
| Minimum | 1000008 |
|---|---|
| 5-th percentile | 1060690 |
| Q1 | 1298868.25 |
| median | 1596225 |
| Q3 | 1895572.5 |
| 95-th percentile | 2133377.25 |
| Maximum | 2193542 |
| Range | 1193534 |
| Interquartile range (IQR) | 596704.25 |
Descriptive statistics
| Standard deviation | 344011.4207 |
|---|---|
| Coefficient of variation (CV) | 0.2154016619 |
| Kurtosis | -1.199245301 |
| Mean | 1597069.483 |
| Median Absolute Deviation (MAD) | 298340 |
| Skewness | 6.817747206e-06 |
| Sum | 1.578319887e+11 |
| Variance | 1.183438576e+11 |
| Value | Count | Frequency (%) | |
| 1159224 | 1 | < 0.1% | |
| 1508420 | 1 | < 0.1% | |
| 1470505 | 1 | < 0.1% | |
| 1819145 | 1 | < 0.1% | |
| 1367691 | 1 | < 0.1% | |
| 1055510 | 1 | < 0.1% | |
| 1855227 | 1 | < 0.1% | |
| 2110369 | 1 | < 0.1% | |
| 1991449 | 1 | < 0.1% | |
| 2128666 | 1 | < 0.1% | |
| Other values (98816) | 98816 | > 99.9% |
| Value | Count | Frequency (%) | |
| 1000008 | 1 | < 0.1% | |
| 1000013 | 1 | < 0.1% | |
| 1000015 | 1 | < 0.1% | |
| 1000038 | 1 | < 0.1% | |
| 1000059 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 2193542 | 1 | < 0.1% | |
| 2193538 | 1 | < 0.1% | |
| 2193522 | 1 | < 0.1% | |
| 2193499 | 1 | < 0.1% | |
| 2193485 | 1 | < 0.1% |
| Distinct count | 101 |
|---|---|
| Unique (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 37.212646469552546 |
|---|---|
| Minimum | 13 |
| Maximum | 113 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 772.2 KiB |
Quantile statistics
| Minimum | 13 |
|---|---|
| 5-th percentile | 15 |
| Q1 | 20 |
| median | 28 |
| Q3 | 50 |
| 95-th percentile | 89 |
| Maximum | 113 |
| Range | 100 |
| Interquartile range (IQR) | 30 |
Descriptive statistics
| Standard deviation | 22.5242197 |
|---|---|
| Coefficient of variation (CV) | 0.6052840051 |
| Kurtosis | 1.581874703 |
| Mean | 37.21264647 |
| Median Absolute Deviation (MAD) | 10 |
| Skewness | 1.418907131 |
| Sum | 3677577 |
| Variance | 507.3404729 |
| Value | Count | Frequency (%) | |
| 18 | 5196 | 5.3% | |
| 23 | 4402 | 4.5% | |
| 19 | 4390 | 4.4% | |
| 20 | 3768 | 3.8% | |
| 21 | 3670 | 3.7% | |
| 25 | 3636 | 3.7% | |
| 17 | 3281 | 3.3% | |
| 16 | 3086 | 3.1% | |
| 22 | 3032 | 3.1% | |
| 24 | 2827 | 2.9% | |
| Other values (91) | 61538 | 62.3% |
| Value | Count | Frequency (%) | |
| 13 | 484 | 0.5% | |
| 14 | 1925 | 1.9% | |
| 15 | 2617 | 2.6% | |
| 16 | 3086 | 3.1% | |
| 17 | 3281 | 3.3% |
| Value | Count | Frequency (%) | |
| 113 | 196 | 0.2% | |
| 112 | 18 | < 0.1% | |
| 111 | 17 | < 0.1% | |
| 110 | 14 | < 0.1% | |
| 109 | 9 | < 0.1% |
dob_day
Real number (ℝ≥0)
| Distinct count | 31 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 14.533108696092121 |
|---|---|
| Minimum | 1 |
| Maximum | 31 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 772.2 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 7 |
| median | 14 |
| Q3 | 22 |
| 95-th percentile | 29 |
| Maximum | 31 |
| Range | 30 |
| Interquartile range (IQR) | 15 |
Descriptive statistics
| Standard deviation | 9.013865233 |
|---|---|
| Coefficient of variation (CV) | 0.6202296715 |
| Kurtosis | -1.188614701 |
| Mean | 14.5331087 |
| Median Absolute Deviation (MAD) | 8 |
| Skewness | 0.1076041513 |
| Sum | 1436249 |
| Variance | 81.24976644 |
| Value | Count | Frequency (%) | |
| 1 | 7876 | 8.0% | |
| 10 | 4027 | 4.1% | |
| 15 | 3551 | 3.6% | |
| 5 | 3539 | 3.6% | |
| 12 | 3407 | 3.4% | |
| 2 | 3391 | 3.4% | |
| 3 | 3286 | 3.3% | |
| 20 | 3262 | 3.3% | |
| 17 | 3261 | 3.3% | |
| 25 | 3213 | 3.3% | |
| Other values (21) | 60013 | 60.7% |
| Value | Count | Frequency (%) | |
| 1 | 7876 | 8.0% | |
| 2 | 3391 | 3.4% | |
| 3 | 3286 | 3.3% | |
| 4 | 3212 | 3.3% | |
| 5 | 3539 | 3.6% |
| Value | Count | Frequency (%) | |
| 31 | 1506 | 1.5% | |
| 30 | 2526 | 2.6% | |
| 29 | 2502 | 2.5% | |
| 28 | 2944 | 3.0% | |
| 27 | 2753 | 2.8% |
| Distinct count | 101 |
|---|---|
| Unique (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1975.7873535304475 |
|---|---|
| Minimum | 1900 |
| Maximum | 2000 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 772.2 KiB |
Quantile statistics
| Minimum | 1900 |
|---|---|
| 5-th percentile | 1924 |
| Q1 | 1963 |
| median | 1985 |
| Q3 | 1993 |
| 95-th percentile | 1998 |
| Maximum | 2000 |
| Range | 100 |
| Interquartile range (IQR) | 30 |
Descriptive statistics
| Standard deviation | 22.5242197 |
|---|---|
| Coefficient of variation (CV) | 0.01140012343 |
| Kurtosis | 1.581874703 |
| Mean | 1975.787354 |
| Median Absolute Deviation (MAD) | 10 |
| Skewness | -1.418907131 |
| Sum | 195259161 |
| Variance | 507.3404729 |
| Value | Count | Frequency (%) | |
| 1995 | 5196 | 5.3% | |
| 1990 | 4402 | 4.5% | |
| 1994 | 4390 | 4.4% | |
| 1993 | 3768 | 3.8% | |
| 1992 | 3670 | 3.7% | |
| 1988 | 3636 | 3.7% | |
| 1996 | 3281 | 3.3% | |
| 1997 | 3086 | 3.1% | |
| 1991 | 3032 | 3.1% | |
| 1989 | 2827 | 2.9% | |
| Other values (91) | 61538 | 62.3% |
| Value | Count | Frequency (%) | |
| 1900 | 196 | 0.2% | |
| 1901 | 18 | < 0.1% | |
| 1902 | 17 | < 0.1% | |
| 1903 | 14 | < 0.1% | |
| 1904 | 9 | < 0.1% |
| Value | Count | Frequency (%) | |
| 2000 | 484 | 0.5% | |
| 1999 | 1925 | 1.9% | |
| 1998 | 2617 | 2.6% | |
| 1997 | 3086 | 3.1% | |
| 1996 | 3281 | 3.3% |
dob_month
Real number (ℝ≥0)
| Distinct count | 12 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6.284753000222613 |
|---|---|
| Minimum | 1 |
| Maximum | 12 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 772.2 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 3 |
| median | 6 |
| Q3 | 9 |
| 95-th percentile | 12 |
| Maximum | 12 |
| Range | 11 |
| Interquartile range (IQR) | 6 |
Descriptive statistics
| Standard deviation | 3.529431409 |
|---|---|
| Coefficient of variation (CV) | 0.5615863358 |
| Kurtosis | -1.240311668 |
| Mean | 6.284753 |
| Median Absolute Deviation (MAD) | 3 |
| Skewness | 0.03091267457 |
| Sum | 621097 |
| Variance | 12.45688607 |
| Value | Count | Frequency (%) | |
| 1 | 11737 | 11.9% | |
| 10 | 8466 | 8.6% | |
| 5 | 8260 | 8.4% | |
| 8 | 8255 | 8.4% | |
| 3 | 8095 | 8.2% | |
| 7 | 8006 | 8.1% | |
| 9 | 7923 | 8.0% | |
| 12 | 7883 | 8.0% | |
| 4 | 7794 | 7.9% | |
| 2 | 7617 | 7.7% | |
| Other values (2) | 14790 | 15.0% |
| Value | Count | Frequency (%) | |
| 1 | 11737 | 11.9% | |
| 2 | 7617 | 7.7% | |
| 3 | 8095 | 8.2% | |
| 4 | 7794 | 7.9% | |
| 5 | 8260 | 8.4% |
| Value | Count | Frequency (%) | |
| 12 | 7883 | 8.0% | |
| 11 | 7196 | 7.3% | |
| 10 | 8466 | 8.6% | |
| 9 | 7923 | 8.0% | |
| 8 | 8255 | 8.4% |
gender
Categorical
| Distinct count | 2 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 772.2 KiB |
| male | |
|---|---|
| female |
| Value | Count | Frequency (%) | |
| male | 58574 | 59.3% | |
| female | 40252 | 40.7% |
Length
| Max length | 6 |
|---|---|
| Mean length | 4.814603444 |
| Min length | 4 |
| Value | Count | Frequency (%) | |
| Lowercase_Letter | 5 | 100.0% |
| Value | Count | Frequency (%) | |
| Latin | 5 | 100.0% |
| Value | Count | Frequency (%) | |
| ASCII | 5 | 100.0% |
tenure
Real number (ℝ≥0)
| Distinct count | 2418 |
|---|---|
| Unique (%) | 2.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 535.6497581608079 |
|---|---|
| Minimum | 0.0 |
| Maximum | 3139.0 |
| Zeros | 70 |
| Zeros (%) | 0.1% |
| Memory size | 772.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 47 |
| Q1 | 226 |
| median | 412 |
| Q3 | 673 |
| 95-th percentile | 1567 |
| Maximum | 3139 |
| Range | 3139 |
| Interquartile range (IQR) | 447 |
Descriptive statistics
| Standard deviation | 454.2584231 |
|---|---|
| Coefficient of variation (CV) | 0.8480512054 |
| Kurtosis | 2.195382887 |
| Mean | 535.6497582 |
| Median Absolute Deviation (MAD) | 212 |
| Skewness | 1.530832595 |
| Sum | 52936123 |
| Variance | 206350.7149 |
| Value | Count | Frequency (%) | |
| 300 | 173 | 0.2% | |
| 303 | 170 | 0.2% | |
| 272 | 163 | 0.2% | |
| 242 | 163 | 0.2% | |
| 297 | 161 | 0.2% | |
| 257 | 161 | 0.2% | |
| 280 | 160 | 0.2% | |
| 285 | 160 | 0.2% | |
| 278 | 158 | 0.2% | |
| 284 | 158 | 0.2% | |
| Other values (2408) | 97199 | 98.4% |
| Value | Count | Frequency (%) | |
| 0 | 70 | 0.1% | |
| 1 | 60 | 0.1% | |
| 2 | 72 | 0.1% | |
| 3 | 79 | 0.1% | |
| 4 | 86 | 0.1% |
| Value | Count | Frequency (%) | |
| 3139 | 3 | < 0.1% | |
| 3129 | 1 | < 0.1% | |
| 3128 | 1 | < 0.1% | |
| 3101 | 1 | < 0.1% | |
| 3019 | 1 | < 0.1% |
| Distinct count | 2561 |
|---|---|
| Unique (%) | 2.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 196.37403112541233 |
|---|---|
| Minimum | 0 |
| Maximum | 4923 |
| Zeros | 1962 |
| Zeros (%) | 2.0% |
| Memory size | 772.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 3 |
| Q1 | 31 |
| median | 82 |
| Q3 | 206 |
| 95-th percentile | 720 |
| Maximum | 4923 |
| Range | 4923 |
| Interquartile range (IQR) | 175 |
Descriptive statistics
| Standard deviation | 387.4634749 |
|---|---|
| Coefficient of variation (CV) | 1.973089174 |
| Kurtosis | 50.08442141 |
| Mean | 196.3740311 |
| Median Absolute Deviation (MAD) | 64 |
| Skewness | 6.059131774 |
| Sum | 19406860 |
| Variance | 150127.9444 |
| Value | Count | Frequency (%) | |
| 0 | 1962 | 2.0% | |
| 1 | 1815 | 1.8% | |
| 2 | 1116 | 1.1% | |
| 3 | 860 | 0.9% | |
| 5 | 785 | 0.8% | |
| 4 | 747 | 0.8% | |
| 10 | 737 | 0.7% | |
| 24 | 732 | 0.7% | |
| 6 | 720 | 0.7% | |
| 8 | 718 | 0.7% | |
| Other values (2551) | 88634 | 89.7% |
| Value | Count | Frequency (%) | |
| 0 | 1962 | 2.0% | |
| 1 | 1815 | 1.8% | |
| 2 | 1116 | 1.1% | |
| 3 | 860 | 0.9% | |
| 4 | 747 | 0.8% |
| Value | Count | Frequency (%) | |
| 4923 | 1 | < 0.1% | |
| 4917 | 1 | < 0.1% | |
| 4863 | 1 | < 0.1% | |
| 4845 | 1 | < 0.1% | |
| 4844 | 1 | < 0.1% |
| Distinct count | 1519 |
|---|---|
| Unique (%) | 1.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 107.48005585574646 |
|---|---|
| Minimum | 0 |
| Maximum | 4144 |
| Zeros | 2994 |
| Zeros (%) | 3.0% |
| Memory size | 772.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 17 |
| median | 46 |
| Q3 | 117 |
| 95-th percentile | 418 |
| Maximum | 4144 |
| Range | 4144 |
| Interquartile range (IQR) | 100 |
Descriptive statistics
| Standard deviation | 188.8615806 |
|---|---|
| Coefficient of variation (CV) | 1.757177917 |
| Kurtosis | 42.53201072 |
| Mean | 107.4800559 |
| Median Absolute Deviation (MAD) | 36 |
| Skewness | 5.151208978 |
| Sum | 10621824 |
| Variance | 35668.69664 |
| Value | Count | Frequency (%) | |
| 0 | 2994 | 3.0% | |
| 1 | 2210 | 2.2% | |
| 2 | 1547 | 1.6% | |
| 3 | 1354 | 1.4% | |
| 4 | 1348 | 1.4% | |
| 6 | 1325 | 1.3% | |
| 5 | 1325 | 1.3% | |
| 11 | 1317 | 1.3% | |
| 8 | 1312 | 1.3% | |
| 13 | 1276 | 1.3% | |
| Other values (1509) | 82818 | 83.8% |
| Value | Count | Frequency (%) | |
| 0 | 2994 | 3.0% | |
| 1 | 2210 | 2.2% | |
| 2 | 1547 | 1.6% | |
| 3 | 1354 | 1.4% | |
| 4 | 1348 | 1.4% |
| Value | Count | Frequency (%) | |
| 4144 | 1 | < 0.1% | |
| 3654 | 1 | < 0.1% | |
| 3594 | 1 | < 0.1% | |
| 3538 | 1 | < 0.1% | |
| 3415 | 1 | < 0.1% |
| Distinct count | 2921 |
|---|---|
| Unique (%) | 3.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 156.1117620869002 |
|---|---|
| Minimum | 0 |
| Maximum | 25111 |
| Zeros | 22285 |
| Zeros (%) | 22.5% |
| Memory size | 772.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 1 |
| median | 11 |
| Q3 | 81 |
| 95-th percentile | 726 |
| Maximum | 25111 |
| Range | 25111 |
| Interquartile range (IQR) | 80 |
Descriptive statistics
| Standard deviation | 572.5535042 |
|---|---|
| Coefficient of variation (CV) | 3.667587225 |
| Kurtosis | 200.4028959 |
| Mean | 156.1117621 |
| Median Absolute Deviation (MAD) | 11 |
| Skewness | 11.02417951 |
| Sum | 15427901 |
| Variance | 327817.5152 |
| Value | Count | Frequency (%) | |
| 0 | 22285 | 22.5% | |
| 1 | 6916 | 7.0% | |
| 2 | 4428 | 4.5% | |
| 3 | 3235 | 3.3% | |
| 4 | 2503 | 2.5% | |
| 5 | 2025 | 2.0% | |
| 6 | 1804 | 1.8% | |
| 7 | 1615 | 1.6% | |
| 8 | 1430 | 1.4% | |
| 9 | 1379 | 1.4% | |
| Other values (2911) | 51206 | 51.8% |
| Value | Count | Frequency (%) | |
| 0 | 22285 | 22.5% | |
| 1 | 6916 | 7.0% | |
| 2 | 4428 | 4.5% | |
| 3 | 3235 | 3.3% | |
| 4 | 2503 | 2.5% |
| Value | Count | Frequency (%) | |
| 25111 | 1 | < 0.1% | |
| 21652 | 1 | < 0.1% | |
| 16732 | 1 | < 0.1% | |
| 16583 | 1 | < 0.1% | |
| 14799 | 1 | < 0.1% |
| Distinct count | 2676 |
|---|---|
| Unique (%) | 2.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 142.66543217371947 |
|---|---|
| Minimum | 0 |
| Maximum | 261197 |
| Zeros | 24400 |
| Zeros (%) | 24.7% |
| Memory size | 772.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 1 |
| median | 8 |
| Q3 | 59 |
| 95-th percentile | 560.75 |
| Maximum | 261197 |
| Range | 261197 |
| Interquartile range (IQR) | 58 |
Descriptive statistics
| Standard deviation | 1388.990063 |
|---|---|
| Coefficient of variation (CV) | 9.7359959 |
| Kurtosis | 17362.4551 |
| Mean | 142.6654322 |
| Median Absolute Deviation (MAD) | 8 |
| Skewness | 112.0153748 |
| Sum | 14099054 |
| Variance | 1929293.394 |
| Value | Count | Frequency (%) | |
| 0 | 24400 | 24.7% | |
| 1 | 7291 | 7.4% | |
| 2 | 4537 | 4.6% | |
| 3 | 3342 | 3.4% | |
| 4 | 2663 | 2.7% | |
| 5 | 2367 | 2.4% | |
| 6 | 1868 | 1.9% | |
| 7 | 1678 | 1.7% | |
| 8 | 1535 | 1.6% | |
| 9 | 1349 | 1.4% | |
| Other values (2666) | 47796 | 48.4% |
| Value | Count | Frequency (%) | |
| 0 | 24400 | 24.7% | |
| 1 | 7291 | 7.4% | |
| 2 | 4537 | 4.6% | |
| 3 | 3342 | 3.4% | |
| 4 | 2663 | 2.7% |
| Value | Count | Frequency (%) | |
| 261197 | 1 | < 0.1% | |
| 178166 | 1 | < 0.1% | |
| 152014 | 1 | < 0.1% | |
| 106025 | 1 | < 0.1% | |
| 82623 | 1 | < 0.1% |
| Distinct count | 2394 |
|---|---|
| Unique (%) | 2.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 106.14784570861919 |
|---|---|
| Minimum | 0 |
| Maximum | 25111 |
| Zeros | 35002 |
| Zeros (%) | 35.4% |
| Memory size | 772.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 4 |
| Q3 | 46 |
| 95-th percentile | 482 |
| Maximum | 25111 |
| Range | 25111 |
| Interquartile range (IQR) | 46 |
Descriptive statistics
| Standard deviation | 445.4947031 |
|---|---|
| Coefficient of variation (CV) | 4.196926467 |
| Kurtosis | 360.8353367 |
| Mean | 106.1478457 |
| Median Absolute Deviation (MAD) | 4 |
| Skewness | 14.16087797 |
| Sum | 10490167 |
| Variance | 198465.5305 |
| Value | Count | Frequency (%) | |
| 0 | 35002 | 35.4% | |
| 1 | 6287 | 6.4% | |
| 2 | 3930 | 4.0% | |
| 3 | 2910 | 2.9% | |
| 4 | 2262 | 2.3% | |
| 5 | 1790 | 1.8% | |
| 6 | 1597 | 1.6% | |
| 7 | 1395 | 1.4% | |
| 8 | 1210 | 1.2% | |
| 9 | 1148 | 1.2% | |
| Other values (2384) | 41295 | 41.8% |
| Value | Count | Frequency (%) | |
| 0 | 35002 | 35.4% | |
| 1 | 6287 | 6.4% | |
| 2 | 3930 | 4.0% | |
| 3 | 2910 | 2.9% | |
| 4 | 2262 | 2.3% |
| Value | Count | Frequency (%) | |
| 25111 | 1 | < 0.1% | |
| 21652 | 1 | < 0.1% | |
| 16732 | 1 | < 0.1% | |
| 14039 | 1 | < 0.1% | |
| 13529 | 1 | < 0.1% |
| Distinct count | 2002 |
|---|---|
| Unique (%) | 2.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 84.11883512435999 |
|---|---|
| Minimum | 0 |
| Maximum | 138561 |
| Zeros | 29964 |
| Zeros (%) | 30.3% |
| Memory size | 772.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 4 |
| Q3 | 33 |
| 95-th percentile | 317 |
| Maximum | 138561 |
| Range | 138561 |
| Interquartile range (IQR) | 33 |
Descriptive statistics
| Standard deviation | 840.5433663 |
|---|---|
| Coefficient of variation (CV) | 9.992332455 |
| Kurtosis | 15502.11262 |
| Mean | 84.11883512 |
| Median Absolute Deviation (MAD) | 4 |
| Skewness | 107.4720743 |
| Sum | 8313128 |
| Variance | 706513.1506 |
| Value | Count | Frequency (%) | |
| 0 | 29964 | 30.3% | |
| 1 | 8227 | 8.3% | |
| 2 | 4942 | 5.0% | |
| 3 | 3598 | 3.6% | |
| 4 | 2936 | 3.0% | |
| 5 | 2382 | 2.4% | |
| 6 | 2017 | 2.0% | |
| 7 | 1744 | 1.8% | |
| 8 | 1520 | 1.5% | |
| 9 | 1433 | 1.5% | |
| Other values (1992) | 40063 | 40.5% |
| Value | Count | Frequency (%) | |
| 0 | 29964 | 30.3% | |
| 1 | 8227 | 8.3% | |
| 2 | 4942 | 5.0% | |
| 3 | 3598 | 3.6% | |
| 4 | 2936 | 3.0% |
| Value | Count | Frequency (%) | |
| 138561 | 1 | < 0.1% | |
| 131244 | 1 | < 0.1% | |
| 89911 | 1 | < 0.1% | |
| 73333 | 1 | < 0.1% | |
| 43410 | 1 | < 0.1% |
| Distinct count | 1724 |
|---|---|
| Unique (%) | 1.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 49.96386578430777 |
|---|---|
| Minimum | 0 |
| Maximum | 14865 |
| Zeros | 60935 |
| Zeros (%) | 61.7% |
| Memory size | 772.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 7 |
| 95-th percentile | 208 |
| Maximum | 14865 |
| Range | 14865 |
| Interquartile range (IQR) | 7 |
Descriptive statistics
| Standard deviation | 285.7514889 |
|---|---|
| Coefficient of variation (CV) | 5.719162926 |
| Kurtosis | 448.7421466 |
| Mean | 49.96386578 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 16.90611438 |
| Sum | 4937729 |
| Variance | 81653.91338 |
| Value | Count | Frequency (%) | |
| 0 | 60935 | 61.7% | |
| 1 | 4678 | 4.7% | |
| 2 | 2750 | 2.8% | |
| 3 | 1945 | 2.0% | |
| 4 | 1415 | 1.4% | |
| 5 | 1201 | 1.2% | |
| 6 | 1075 | 1.1% | |
| 7 | 895 | 0.9% | |
| 8 | 790 | 0.8% | |
| 9 | 755 | 0.8% | |
| Other values (1714) | 22387 | 22.7% |
| Value | Count | Frequency (%) | |
| 0 | 60935 | 61.7% | |
| 1 | 4678 | 4.7% | |
| 2 | 2750 | 2.8% | |
| 3 | 1945 | 2.0% | |
| 4 | 1415 | 1.4% |
| Value | Count | Frequency (%) | |
| 14865 | 1 | < 0.1% | |
| 12903 | 1 | < 0.1% | |
| 11077 | 1 | < 0.1% | |
| 10763 | 1 | < 0.1% | |
| 10627 | 1 | < 0.1% |
| Distinct count | 1634 |
|---|---|
| Unique (%) | 1.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 58.54655657418088 |
|---|---|
| Minimum | 0 |
| Maximum | 129953 |
| Zeros | 36825 |
| Zeros (%) | 37.3% |
| Memory size | 772.2 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 2 |
| Q3 | 20 |
| 95-th percentile | 227 |
| Maximum | 129953 |
| Range | 129953 |
| Interquartile range (IQR) | 20 |
Descriptive statistics
| Standard deviation | 601.8804964 |
|---|---|
| Coefficient of variation (CV) | 10.28037397 |
| Kurtosis | 23781.41499 |
| Mean | 58.54655657 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | 126.1906692 |
| Sum | 5785922 |
| Variance | 362260.132 |
| Value | Count | Frequency (%) | |
| 0 | 36825 | 37.3% | |
| 1 | 8497 | 8.6% | |
| 2 | 5096 | 5.2% | |
| 3 | 3582 | 3.6% | |
| 4 | 2823 | 2.9% | |
| 5 | 2313 | 2.3% | |
| 6 | 1916 | 1.9% | |
| 7 | 1596 | 1.6% | |
| 8 | 1442 | 1.5% | |
| 9 | 1369 | 1.4% | |
| Other values (1624) | 33367 | 33.8% |
| Value | Count | Frequency (%) | |
| 0 | 36825 | 37.3% | |
| 1 | 8497 | 8.6% | |
| 2 | 5096 | 5.2% | |
| 3 | 3582 | 3.6% | |
| 4 | 2823 | 2.9% |
| Value | Count | Frequency (%) | |
| 129953 | 1 | < 0.1% | |
| 62103 | 1 | < 0.1% | |
| 39605 | 1 | < 0.1% | |
| 39213 | 1 | < 0.1% | |
| 34039 | 1 | < 0.1% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.First rows
| df_index | userid | age | dob_day | dob_year | dob_month | gender | tenure | friend_count | friendships_initiated | likes | likes_received | mobile_likes | mobile_likes_received | www_likes | www_likes_received | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 2094382 | 14 | 19 | 1999 | 11 | male | 266.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 1 | 1192601 | 14 | 2 | 1999 | 11 | female | 6.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2 | 2 | 2083884 | 14 | 16 | 1999 | 11 | male | 13.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 3 | 3 | 1203168 | 14 | 25 | 1999 | 12 | female | 93.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 4 | 4 | 1733186 | 14 | 4 | 1999 | 12 | male | 82.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 5 | 5 | 1524765 | 14 | 1 | 1999 | 12 | male | 15.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 6 | 6 | 1136133 | 13 | 14 | 2000 | 1 | male | 12.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 7 | 7 | 1680361 | 13 | 4 | 2000 | 1 | female | 0.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 8 | 8 | 1365174 | 13 | 1 | 2000 | 1 | male | 81.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 9 | 9 | 1712567 | 13 | 2 | 2000 | 2 | male | 171.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Last rows
| df_index | userid | age | dob_day | dob_year | dob_month | gender | tenure | friend_count | friendships_initiated | likes | likes_received | mobile_likes | mobile_likes_received | www_likes | www_likes_received | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 98816 | 98993 | 1654565 | 19 | 15 | 1994 | 8 | male | 394.0 | 4538 | 4144 | 4501 | 15088 | 4435 | 5961 | 66 | 9127 |
| 98817 | 98994 | 2063006 | 20 | 4 | 1993 | 1 | female | 402.0 | 1988 | 332 | 7351 | 106025 | 7248 | 73333 | 103 | 32692 |
| 98818 | 98995 | 1132164 | 20 | 9 | 1993 | 10 | female | 699.0 | 3611 | 973 | 4507 | 7768 | 4414 | 6909 | 93 | 859 |
| 98819 | 98996 | 1668695 | 24 | 25 | 1989 | 4 | female | 182.0 | 2938 | 1272 | 6018 | 17765 | 5843 | 11708 | 175 | 6057 |
| 98820 | 98997 | 1458985 | 28 | 14 | 1985 | 12 | female | 290.0 | 2218 | 1618 | 4626 | 10268 | 4290 | 4250 | 336 | 6018 |
| 98821 | 98998 | 1268299 | 68 | 4 | 1945 | 4 | female | 541.0 | 2118 | 341 | 3996 | 18089 | 3505 | 11887 | 491 | 6202 |
| 98822 | 98999 | 1256153 | 18 | 12 | 1995 | 3 | female | 21.0 | 1968 | 1720 | 4401 | 13412 | 4399 | 10592 | 2 | 2820 |
| 98823 | 99000 | 1195943 | 15 | 10 | 1998 | 5 | female | 111.0 | 2002 | 1524 | 11959 | 12554 | 11959 | 11462 | 0 | 1092 |
| 98824 | 99001 | 1468023 | 23 | 11 | 1990 | 4 | female | 416.0 | 2560 | 185 | 4506 | 6516 | 4506 | 5760 | 0 | 756 |
| 98825 | 99002 | 1397896 | 39 | 15 | 1974 | 5 | female | 397.0 | 2049 | 768 | 9410 | 12443 | 9410 | 9530 | 0 | 2913 |